Automatic Recognition of Verbal Polysemy

نویسندگان

  • Fumiyo Fukumoto
  • Jun'ichi Tsujii
چکیده

Polysemy is one of the major causes of difficulties in selnantic clustering of words in a corpus. In this paper, we first; give a definition of polysemy from the viewpoint of clustering and then, b~rsed on this definition, we propose a clustering method which reeognises verbal 1)olysemics from a textual corpus. The results of experiments denmnst ra te the effectiveness of the proI)osed method. 1 I n t r o d u c t i o n ']?here has 1)een quite a h)t of research concerned with automat ic clustering of semantically similar words or automat ic recognition of colloc~rtions among them from eorl)ort~ [Church, 1OVl], [Hindle, 1991], [Smadja, 1991]. Most of this work is based on similarity measures derived fl'om the distr i lmtion of woMs in corpora. However, the Nets tha t a single word does have more than one meaning and tha t the distr ibut ion of a word in a corpus is a mixture of usages of different meanings of the same word often hamper such atteml~ts. The meaning of a word depends on the domain in which it is used; the sitme word c'an be use(l differently in different dolnains. It is also often the ease theft a word which is l/olysemous in general is not l)olysemous in a r(,strieted subject domain. In general, restriction of tllc subject domain makes the t)roblenl of 1)olysemy less l)rol)lematie. However, even in texts fronl a restricted domain such as Wall Street Journal l, one eneount.ers quite a large nulnber of l)olyselnous words, in particular, unlike nouns, verbs are often i/olys(mwus ev(,n in a restricted subject domain. Because polysemous verbs are usually also highfrequency verbs, their treal:ment is crucial ill actual applications. Furthermore, beeause of their highfrequen(:y, polysemous verbs tend to have a harmflfl inth,ence on the senlantic ehtstering of l/ollns, ])eeallSO semantic clustering of nollns is usually 1)eribrmed based on th(.ir eollo('ational 1)ehaviour with verbs. * I.'UKUMOTO i.~ now at Department of Ele(:trical Engineering mM (~omputcr Sciencc, Faculty of EngineerilJg, Yamanashi UaivcrMty. E-mail fukumoto~skyc.esi,yamalmshi.ac.jp t W a l l S t r e e t ,lo'~tr'nal was prepared by ACi,(Associalkm for (~omputational IAt~gMstics' Data Collection Initi~ttivu) in [99l. Although polysemy is said to be widespread in language, the definition of polysemy is highly subjective. Polysemy can only be recognised by hunmn intuit ion and different linguists often identify a different number o f senses in the same word. In this paper, we first give a definition of polysemy fl'om the viewpoint of clustering, and propose an overlapping clustering method which automatically reeognises polysemous words. The results of experiments are also given to demonst ra te the effectiveness of our method. 2 R e l a t e d Work Although there have been several a t tempts to extract semantically similar words from a, given corpus, few studies seriously deal with the problenl of 1)olysemy; of these, even fewer are based on real texts. The techniques developed by Zernik [Zernik, 1991] and Brown [Brown, 1991] seem to cope with the discrimination of polysemy and 1)e ll~Lse(l on real texts. Zernik used monolingual texts which consist of about 1 nfillion words tagged by 1)art-of-spee(:h. I~Iis method associates ca(-h word se.nse of a polysemous woM with a set of its co-occurring words. If a word has sew eral senses, then the word is assoeiated with several different sets of co-occurring words, each of which corresponds to one of the senses of the word. The linfitation of Zernik's method, however, is tha t it solely relies on human intuit ion for identifying different senses of a word, i.e. the human editor tlas to determine, by her /h is intuition, how many seilses a word has, and then identii~y the sets of co-occurring words (signat.lcres) tha t correspond to the different senses. Brown used bilingual texts, which consist of ]2 million words. The results of Brown's technique, when al)plied to a French-English nmchine transb~tion system, seems to show its eflbctiveness and validity. However, as he admits, the at)preach is linfited because it can only assign at most two senses to a word. More seriously, 1)olysemy is defined in terms of translation, i.e. only when a word is Lranslated into two different words in a target language, it is recogniscd as polysemous. The apllroach can bc used only when a large 1)aral lel corpus is awdhtble. Furthermore, individual senses thus identified (1(1 not neeessarily const i tute single semant ic units in the monolingual domain to which 1)lausible semantic prollertics (i.e. semantic rest;rictions,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantico-cognitive Representation of Meanings Application to Verbal Polysemy

This communication describes the theoretical and practical tools that permit the automatic construction of a verbal lexicon in view of a didactic utilisation by linguists and an automatic treatment of written texts. Firstly we present the taken theoretical choices for the representation of the verbal polysemy in view of a practical utilisation; and secondly we present some elements of a semi-au...

متن کامل

Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods

For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...

متن کامل

Verbal Polysemy Resolution through Contextualized Clustering of Arguments

Verbal Polysemy Resolution through Contextualized Clustering of Arguments A dissertation presented to the Faculty of the Graduate School of Arts and Sciences of Brandeis University, Waltham, Massachusetts by Anna A. Rumshisky Natural language is characterized by a high degree of polysemy, and the majority of content words accept multiple interpretations. However, this does not significantly com...

متن کامل

Advances in Automatic Speech Recognition by Imitating Spreading Activation

Inspired by recent insights into the properties of statistical word co-occurrences, we propose a mechanism which imitates spreading activation in the human mind in order to improve the identification of words during the automatic speech recognition process. This mechanism is able to make accurate semantic predictions about t he currently uttered word as well as about words which are likely to c...

متن کامل

Automatic Emotion Recognition Using Facial Expression: A Review

---------------------------------------------------------------------***--------------------------------------------------------------------Abstract This paper objective is to introduce needs and applications of facial expression recognition. Between Verbal & Non-Verbal form of communication facial expression is form of non-verbal communication but it plays pivotal role. It express human perspe...

متن کامل

Dimensionality Reduction and Improving the Performance of Automatic Modulation Classification using Genetic Programming (RESEARCH NOTE)

This paper shows how we can make advantage of using genetic programming in selection of suitable features for automatic modulation recognition. Automatic modulation recognition is one of the essential components of modern receivers. In this regard, selection of suitable features may significantly affect the performance of the process. Simulations were conducted with 5db and 10db SNRs. Test and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994